Dataset Description

This dashboard uses data from the publicly available SEER colorectal cancer dataset, provided by the National Cancer Institute.
Data were collected from hospital records and state cancer registries covering individuals aged 18 to 50 residing in Georgia.
The dataset includes case counts and population data aggregated by county and race group.
The analysis focuses on the time period from 2000 to 2010.
County-level incidence rates were calculated using standardized population counts and matched to geographic shapefiles for visualization.

Interactive Scatter Plot: Population vs Incidence Rate.


This scatter plot shows the relationship between population size and colorectal cancer incidence rate in Georgia counties (2000–2010).

Key Takeaways

Interactive Choropleth Map: County-Level Incidence.

Reading layer `tl_2010_13_county10' from data source 
  `E:\Project\Data\tl_2010_13_county10.shp' using driver `ESRI Shapefile'
Simple feature collection with 159 features and 18 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -85.60516 ymin: 30.35576 xmax: -80.75143 ymax: 35.00066
Geodetic CRS:  NAD83

This map shows county-level colorectal cancer incidence rates (2000–2010) in Georgia. The color intensity indicates the rate per 100,000 people.

Key Takeaways

Real-world impact

This dashboard helps identify Georgia counties with unusually high colorectal cancer incidence rates, guiding targeted public health action.

It provides a visual tool to support resource allocation, community outreach, and the investigation of geographic health disparities.

---
title: "Colorectal Cancer Incidence Dashboard"
output:
  flexdashboard::flex_dashboard:
    storyboard: true
    social: menu
    source: embed
---

```{r setup, include=FALSE}
library(flexdashboard)
library(dplyr)
library(plotly)
library(leaflet)
library(sf)
library(htmltools)
library(htmlwidgets)
```



### Dataset Description

<div style="font-size:18px;">

This dashboard uses data from the publicly available SEER colorectal cancer dataset, provided by the National Cancer Institute.  
Data were collected from hospital records and state cancer registries covering individuals aged 18 to 50 residing in Georgia.  
The dataset includes case counts and population data aggregated by county and race group.  
The analysis focuses on the time period from 2000 to 2010.  
County-level incidence rates were calculated using standardized population counts and matched to geographic shapefiles for visualization.

</div>

### Interactive Scatter Plot: Population vs Incidence Rate.

```{r}
data_file <- "E:/Project/Data/processed_filtered_male_2000.txt"
if (!file.exists(data_file)) stop("Data file not found.")

incidence_raw <- read.table(data_file, header = TRUE, sep = "\t",
                            stringsAsFactors = FALSE, fileEncoding = "UTF-8")
incidence_raw$Population <- as.numeric(gsub(",", "", incidence_raw$Population))
incidence_raw <- incidence_raw[!is.na(incidence_raw$Population) & incidence_raw$Population > 0, ]

incidence_data <- incidence_raw %>%
  group_by(GEOID, county_name) %>%
  summarise(
    Total_Pop = sum(Population),
    Total_Count = sum(Count),
    .groups = "drop"
  ) %>%
  mutate(
    Incidence_Rate = (Total_Count / Total_Pop) * 100000)
```

```{r}
plot_ly(
  incidence_data,
  x = ~Total_Pop,
  y = ~Incidence_Rate,
  text = ~paste0(
    "County: ", county_name, "<br>",
    "Population: ", formatC(Total_Pop, format = "d", big.mark = ","), "<br>",
    "Incidence Rate: ", sprintf("%.2f", Incidence_Rate), " per 100K"
  ),
  type = 'scatter',
  mode = 'markers',
  marker = list(size = 10, color = 'blue'),
  hoverinfo = "text"
) %>%
  layout(
    title = list(text = "Incidence Rate vs Population", y = 0.95),
    autosize = TRUE,
    margin = list(t = 60, b = 100, l = 60, r = 40),
    xaxis = list(title = "Total Population"),
    yaxis = list(title = "Incidence Rate per 100,000")
  ) %>%
  config(responsive = TRUE)
```

***

This scatter plot shows the relationship between population size and colorectal cancer incidence rate in Georgia counties (2000–2010).

Key Takeaways

- The majority of counties have populations under 500,000 and incidence rates between 5 and 20 per 100,000.

- Larger counties show less variability in incidence rates, clustering around consistent values.

- Counties with smaller populations tend to have greater variation, including outliers with very high rates, possibly due to data instability in low-population areas.


### Interactive Choropleth Map: County-Level Incidence.

```{r}
incidence_data$GEOID <- as.character(incidence_data$GEOID)

shp_file <- "E:/Project/Data/tl_2010_13_county10.shp"
if (!file.exists(shp_file)) stop("Shapefile not found.")

georgia_counties <- st_read(shp_file)
georgia_counties$GEOID10 <- as.character(georgia_counties$GEOID10)

g_map_data <- left_join(georgia_counties, incidence_data, by = c("GEOID10" = "GEOID"))
g_map_data <- g_map_data[!is.na(g_map_data$Incidence_Rate), ]

labels <- sprintf(
  "<strong>County:</strong> %s<br/>
   <strong>Total Population:</strong> %s<br/>
   <strong>Incidence Rate:</strong> %.2f per 100K",
  g_map_data$county_name,
  formatC(g_map_data$Total_Pop, format = "d", big.mark = ","),
  g_map_data$Incidence_Rate
) %>% lapply(htmltools::HTML)

pal <- colorNumeric("Blues", domain = g_map_data$Incidence_Rate, na.color = "transparent")
```

```{r}
leaflet(g_map_data, options = leafletOptions(zoomControlPosition = "bottomright")) %>%
  addProviderTiles("CartoDB.Positron") %>%
  addPolygons(
    fillColor = ~pal(Incidence_Rate),
    weight = 1,
    opacity = 1,
    color = "white",
    dashArray = "3",
    fillOpacity = 0.7,
    highlight = highlightOptions(
      weight = 3,
      color = "#666",
      dashArray = "",
      fillOpacity = 0.7,
      bringToFront = TRUE
    ),
    label = labels,
    labelOptions = labelOptions(
      style = list("font-weight" = "normal", padding = "3px 8px"),
      textsize = "13px",
      direction = "auto"
    )
  ) %>%
  addLegend(
    pal = pal,
    values = ~Incidence_Rate,
    opacity = 0.7,
    title = "Incidence Rate<br>(per 100,000)",
    position = "topright"
  ) %>%
  addControl(
    html = "<div style='font-size:20px;'><strong>Colorectal Cancer Incidence in Georgia</strong></div>",
    position = "bottomleft"
  )
```

***

This map shows county-level colorectal cancer incidence rates (2000–2010) in Georgia. The color intensity indicates the rate per 100,000 people.

Key Takeaways

- Colorectal cancer incidence is not evenly distributed across Georgia’s counties.

- Several counties in central and southwestern Georgia display noticeably higher incidence rates.

- The spatial distribution suggests that localized factors may influence colorectal cancer risk and deserve further investigation.


### Real-world impact

<div style="font-size:18px;">

This dashboard helps identify Georgia counties with unusually high colorectal cancer incidence rates, guiding targeted public health action.  

It provides a visual tool to support resource allocation, community outreach, and the investigation of geographic health disparities.

</div>

### Link to github repository

<div style="font-size:18px;">

The full source code is embedded in this document and available for review. The visualizations are generated using plotly and tmap, ensuring interactivity and usability.

🔗 View project source code on <a href="https://github.com/rocinante0v0/Interactive-Cancer-Incidence-Visualizations" target="_blank">GitHub</a>

</div>